May 6, 2019

A short introduction to data linkage with dplyr

Data linkage with dplyr

Two datasets:

Particular matter: "pm" dataset
city pm_mg
Amman 999
Saltillo 869
Usak 814
The Bronx 284
Seoul 129
GPS coordinates of cities: "city_coords" dataset
city lng lat
Amman 31.9100 35.94800
Saltillo -100.9940 25.43400
Berlin 13.4020 52.52000
Usak 29.4030 38.67300
New York -73.9400 40.60000
Seoul 126.9838 37.53626

  • to combine these datasets, the "city" column might be used as identifier for matching
  • not all cities in "pm" also appear in "city_coords" and vice versa

Join operations

pm
city pm_mg
Amman 999
Saltillo 869
Usak 814
The Bronx 284
Seoul 129
city_coords
city lng lat
Amman 31.9100 35.94800
Saltillo -100.9940 25.43400
Berlin 13.4020 52.52000
Usak 29.4030 38.67300
New York -73.9400 40.60000
Seoul 126.9838 37.53626


Datasets can also be joined with merge, but I find dplyr easier to use.

left_join(a, b, by = <criterion>): always retains rows on the "left side" and fills up non-matching rows with NAs.

library(dplyr)
left_join(pm, city_coords, by = 'city')
##        city pm_mg       lng      lat
## 1     Amman   999   31.9100 35.94800
## 2  Saltillo   869 -100.9940 25.43400
## 3      Usak   814   29.4030 38.67300
## 4 The Bronx   284        NA       NA
## 5     Seoul   129  126.9838 37.53626

Join operations

pm
city pm_mg
Amman 999
Saltillo 869
Usak 814
The Bronx 284
Seoul 129
city_coords
city lng lat
Amman 31.9100 35.94800
Saltillo -100.9940 25.43400
Berlin 13.4020 52.52000
Usak 29.4030 38.67300
New York -73.9400 40.60000
Seoul 126.9838 37.53626


right_join(a, b, by = <criterion>): always retains rows on the "right side" and fills up non-matching rows with NAs. How many rows do you expect for a right join between pm and city_coords? How many of them will contain an NA value?

right_join(pm, city_coords, by = 'city')
##       city pm_mg       lng      lat
## 1    Amman   999   31.9100 35.94800
## 2 Saltillo   869 -100.9940 25.43400
## 3   Berlin    NA   13.4020 52.52000
## 4     Usak   814   29.4030 38.67300
## 5 New York    NA  -73.9400 40.60000
## 6    Seoul   129  126.9838 37.53626

Join operations

pm
city pm_mg
Amman 999
Saltillo 869
Usak 814
The Bronx 284
Seoul 129
city_coords
city lng lat
Amman 31.9100 35.94800
Saltillo -100.9940 25.43400
Berlin 13.4020 52.52000
Usak 29.4030 38.67300
New York -73.9400 40.60000
Seoul 126.9838 37.53626


inner_join(a, b, by = <criterion>): only retains rows that match on both sides.

How many rows do you expect for an inner join between pm and city_coords?

inner_join(pm, city_coords, by = 'city')
##       city pm_mg       lng      lat
## 1    Amman   999   31.9100 35.94800
## 2 Saltillo   869 -100.9940 25.43400
## 3     Usak   814   29.4030 38.67300
## 4    Seoul   129  126.9838 37.53626

Combining data and making a choropleth map

What is a choropleth map?

Making a choropleth map

  • data downloaded from the Berlin Senate Dept. for Urban Dev. and Housing via FIS Broker
  • includes indicators for monitoring of social development ("Monitoring Soziale Stadtentwicklung 2017"):
bln_sozind <- read_sf('data/bln_plr_sozind.geojson')
head(bln_sozind)
## Simple feature collection with 6 features and 10 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 386668.2 ymin: 5817761 xmax: 390764.3 ymax: 5820432
## epsg (SRID):    25833
## proj4string:    +proj=utm +zone=33 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## # A tibble: 6 x 11
##   PLANNAME EW2016 STATUS1 STATUS2 STATUS3 STATUS4 DYNAMO1 DYNAMO2 DYNAMO3
##   <chr>    <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 Stülers… 3114      4.98    1.52    8.16   20.5    -0.19    0.14   -0.42
## 2 Großer … 192      NA      NA      NA      NA      NA      NA      NA   
## 3 Lützows… 5518      6.83    2.08   17.7    34.3    -0.5    -0.03   -0.8 
## 4 Körners… 4704      6.27    1.49   20.7    45.9    -1.33   -0.76   -1.66
## 5 Nördlic… 1135      2.06    0       1.5     3.55   -0.77   -0.74   -0.42
## 6 Wilhelm… 2190      3.64    0.89    8.45   27.3    -1.23   -0.59    0.13
## # … with 2 more variables: DYNAMO4 <dbl>, geometry <MULTIPOLYGON [m]>

Making a choropleth map

## Simple feature collection with 3 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 386668.2 ymin: 5817875 xmax: 389894.1 ymax: 5820432
## epsg (SRID):    25833
## proj4string:    +proj=utm +zone=33 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## # A tibble: 3 x 3
##   PLANNAME      STATUS4                                            geometry
##   <chr>           <dbl>                                  <MULTIPOLYGON [m]>
## 1 Stülerstraße     20.5 (((387256.6 5818552, 387258.6 5818545, 387261.3 58…
## 2 Großer Tierg…    NA   (((386767.5 5819393, 386767.3 5819393, 386767.1 58…
## 3 Lützowstraße     34.3 (((387952.6 5818275, 387974.2 5818265, 387994.8 58…
  • source: FIS Broker, "Monitoring Soziale Stadtentwicklung 2017"¹
  • aim: visualize spatial distrib. of variable STATUS4 – child poverty rate in percent²
  • dataset is already a spatial dataset containing geo-data in geometry column

¹ Note that the FIS-Broker website is kind of awkward to use and you can't link directly to dataset, but you have to search for the mentioned keywords.
² Portion of children under 15 living in household that obtains social support ("Hartz IV")

Making a choropleth map

Directly plotting the map with fill color depending on STATUS4:

ggplot() + geom_sf(aes(fill = STATUS4), data = bln_sozind)

Problem: STATUS4 is continuous and so is the color scale. For choropleth maps, discrete ranges are better for discerning the colors.

Making a choropleth map

Put the percentages into 5 discrete bins, change the color palette, adjust the appearance:

bln_sozind$child_pov_bins <- cut(bln_sozind$STATUS4, seq(0, 100, by = 20))
ggplot() + geom_sf(aes(fill = child_pov_bins), data = bln_sozind) +
    scale_fill_brewer(palette = 'OrRd', na.value = "grey90",
                      guide = guide_legend(title = 'Child pov.\nranges in pct.')) +
    coord_sf(datum = NA) +  # disable lines ("graticules") in the background
    theme_bw() + theme(axis.text = element_blank(), axis.title = element_blank(),
                       axis.ticks = element_blank())

Combining data

Most of the time, you have at least two datasets: one containing the measurement(s) you want to show, the other containing the geo-data.

Example dataset: People at risk of poverty or social exclusion by NUTS 2 regions (tgs00107) from Eurostat.

pov_risk <- read.csv('data/tgs00107_pov_risk_nuts2.csv',
                     stringsAsFactors = FALSE)
pov_risk$risk_pct_bins <- cut(pov_risk$risk_pct, seq(0, 100, by = 10))

pov_risk_2016 <- filter(pov_risk, year == 2016)   # 2016 has fewest NAs
head(pov_risk_2016)
##   nuts year risk_pct risk_pct_bins
## 1   AT 2016     18.0       (10,20]
## 2 AT11 2016     15.1       (10,20]
## 3 AT12 2016     13.0       (10,20]
## 4 AT13 2016     26.0       (20,30]
## 5 AT21 2016     16.3       (10,20]
## 6 AT22 2016     18.4       (10,20]

Combining data

We load the GeoJSON dataset for NUTS level-2 regions provided by Eurostat:

nutsrg2 <- read_sf('data/nutsrg_2.json')
st_crs(nutsrg2) <- 3857  # set the correct CRS
head(nutsrg2)
## Simple feature collection with 6 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 1347007 ymin: 4105701 xmax: 3847132 ymax: 6631069
## epsg (SRID):    3857
## proj4string:    +proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +wktext +no_defs
## # A tibble: 6 x 3
##   id    na                                                         geometry
##   <chr> <chr>                                            <MULTIPOLYGON [m]>
## 1 CY00  Κύπρος      (((3592720 4172836, 3594320 4178301, 3607520 4169324, …
## 2 CZ01  Praha       (((1588619 6463231, 1592220 6468696, 1617821 6476893, …
## 3 CZ02  Střední Če… (((1686224 6537392, 1685024 6526853, 1686624 6506167, …
## 4 CZ03  Jihozápad   (((1492615 6461670, 1536217 6432006, 1539017 6395706, …
## 5 CZ04  Severozápad (((1612621 6534270, 1600220 6515144, 1520216 6485089, …
## 6 CZ05  Severových… (((1729426 6582279, 1758628 6576034, 1786629 6556127, …

Combining data

Both datasets contain a NUTS level-2 identifier so we can join them:

pov_risk_2016_geo <- left_join(nutsrg2, pov_risk_2016, by = c('id' = 'nuts'))
head(pov_risk_2016_geo)
## Simple feature collection with 6 features and 5 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 1347007 ymin: 4105701 xmax: 3847132 ymax: 6631069
## epsg (SRID):    3857
## proj4string:    +proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +wktext +no_defs
## # A tibble: 6 x 6
##   id    na      year risk_pct risk_pct_bins                        geometry
##   <chr> <chr>  <int>    <dbl> <fct>                      <MULTIPOLYGON [m]>
## 1 CY00  Κύπρος    NA     NA   <NA>          (((3592720 4172836, 3594320 41…
## 2 CZ01  Praha   2016     10.1 (10,20]       (((1588619 6463231, 1592220 64…
## 3 CZ02  Střed…  2016     10.3 (10,20]       (((1686224 6537392, 1685024 65…
## 4 CZ03  Jihoz…  2016     10   (0,10]        (((1492615 6461670, 1536217 64…
## 5 CZ04  Sever…  2016     19.5 (10,20]       (((1612621 6534270, 1600220 65…
## 6 CZ05  Sever…  2016     11.7 (10,20]       (((1729426 6582279, 1758628 65…

Plotting the combined data

(eu_pov_plt <- ggplot() + geom_sf(aes(fill = risk_pct_bins),
                                  data = pov_risk_2016_geo, size = 0.1) +
    scale_fill_brewer(palette = 'OrRd', na.value = "grey90",
                      guide = guide_legend(title = 'Pct. of people at\nrisk of poverty')) +
    labs(title = 'Poverty in the EU 2016',
         subtitle = 'Percentage of people at risk of poverty or social exclusion',
         caption = 'source: Eurostat') +
    coord_sf(datum = NA) +  # disable lines ("graticules") in the background
    theme_bw() + theme(axis.text = element_blank(), axis.title = element_blank(),
                       axis.ticks = element_blank()))

Plotting the combined data

Exercises 4 and 5

Choose one of the following two tasks (or make both if you like) and have a look at the handout for further hints:

4. A choropleth map of social indicators in Berlin

  • load the CSV file bln_plr_sozind_data.csv from folder 4_berlin_sozind
  • load the spatial dataset bln_plr.geojson containing the Berlin "Planungsraum" regions
  • join both datasets so that you obtain a combined spatial dataset with the social indicators for their respective regions
  • make a simple choropleth map with a social indicator of your choice
  • make choropleth map of a variable with discrete bins from a social indicator

5. A choropleth map for unemployment rates in the EU

  • load the CSV file tgs00010_unempl_nuts2.csv from folder 5_eu_unempl
  • create a subset with only the data from 2016 in it and create a variable with discrete bins for unempl_pct
  • load the spatial dataset nutsrg_2_2016_epsg3857_20M.json containing the EU NUTS level-2 regions
  • join both datasets so that you obtain a combined spatial dataset with the unemployment rates from tgs00010_unempl_nuts2.csv for their respective regions in nutsrg_2_2016_epsg3857_20M.json
  • make a choropleth map for the total unemployment rate (sex == 'T')
  • optional: make small multiples (via "facetting") by variable sex, i.e. displaying choropleth maps for the unemployment rate of women and men next to each other

Visualizing spatio-temporal data

If you also have a time-dimension in your data that you want to show, there are several options:

  • using small multiples via ggplot facetting
  • using animations via gganimate
  • making an interactive map (e.g. with a "time slider")

Visualizing spatio-temporal data

Small multiples: Dataset with time-related variable; plotted with facets (e.g. facet_wrap())

Visualizing spatio-temporal data

Animations: Use gganimate for time-lapse plot animations.

Visualizing spatio-temporal data

Geocoding and reverse geocoding via Google Maps API

What is geocoding?

Geocoding is the process of finding the geographic coordinates (longitude, latitude) for a specific query term (a full address, a postal code, a city name etc.).

  • Reichpietschufer 50, 10785 Berlin13.36509, 52.50640
  • cute cat cafe new york-73.896916, 40.706934

Reverse geocoding tries to map a set of geographic coordinates to a place name / address.

  • 13.36509, 52.50640Reichpietschufer 50, 10785 Berlin

Getting access

The Google Maps API is a web service that allows programmatic geocoding or reverse geocoding (among other things). But simply, this means:

  • you send a list of queries (e.g. addresses) to a Google server
  • the server responds with a list of geo-coordinates

As of June 2018, the Maps API is part of Google's "Could Platform". This requires you to have:

  1. A Google account (surprise!).
  2. Start a Google Cloud Platform (GCP) free trial (valid for one year).
  3. Register for billing (they want your credit card).
    They promise not to charge it after the end of the free trial…

Inside GCP, you can go to APIs & Services > Credentials to get your API key.

Using ggmap for geocoding

You need to install the package ggmap, at least of version 2.7.

library(ggmap)

# provide the Google Cloud API key here:
register_google(key = google_cloud_api_key)

places <- c('Berlin', 'Leipzig', '10317, Deutschland',
            'Reichpietschufer 50, 10785 Berlin')
# TODO: uncomment again
#place_coords <- geocode(places) %>% mutate(place = places)
#place_coords

Reverse geocoding

Take the WZB's geo-coordinates and see if we can find the address to it:

# first longitude, then latitude
# TODO: uncomment again
#revgeocode(c(13.36509, 52.50640))

Social media posts or digital photos often contain geo-coordinates that you may want to "reverse geocode".

Making your maps publication-ready

Map projections

Can you unroll a cylinder or a cone to a sheet without tearing or stretching?

Yes, easily:

cylinder cone
source: rhino3.de

Map projections

Spheres or ellipsoids cannot be unrolled or unfolded without cutting or stretching. They are "non-developable surfaces".

Coordinate reference systems (again) and projections

Map projections try to do the impossible: project points on a sphere¹ to a plane.

  • we already worked with WGS84 coordinates, which are locations on a sphere¹ defined as degrees
    • -73.94° is the longitude: about 74° West from the meridian
    • 40.6° is the latitude: about 41° North from the equator
  • points from spherical/ellipsoidal coordinate system are converted to a "flat surface" cartesian coordinate system via a map projection
  • a coordinate reference system (CRS) can define a specific map projection

¹ WGS84 actually models the Earth as ellipsoid

Map projections

All map projections distort: area, shape, direction, distance and other properties can be different than on the actual sphere.

There are numerous projections, each with different properties and trade-offs between distortions.

Example projections

Mercator projection

  • very popular
  • strong distortions when far away from equator (Scandinavia, Antarctica, etc.)

Example projections

Example projections

Goode homolosine projection

  • equal-area projection
  • no distortion around equator
  • looks a bit like the peeled orange

Why should I even care?

  • spatial datasets usually specify the CRS → you should be able to set or transform it

  • you may want to use a different projection, because:
    • a different projection gives less distortion in the area of your interest (many historical projections a Euro-centric)
    • inequal area projections give too much visual weight on certain regions (i.e. Northern and Southern regions in Mercator projection)

  • compatibility between datasets matters (e.g. country shapes in specific CRS, but geo-coded points as WGS84 coordinates)

Why should I even care?

  • ETRS89 projection is especially for (central) Europe
  • look at the axis: units are meters, not degrees!

Finding out the CRS

The CRS of a spatial dataset is usually documented as EPSG / SRID number. epsg.io provides information about all common CRS.
When you load a spatial dataset you can see its CRS / projection:

worldmap_data <- st_as_sf(map('world', plot = FALSE, fill = TRUE))
head(worldmap_data, n=2)
## Simple feature collection with 2 features and 1 field
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 19.28066 ymin: 29.39194 xmax: 74.89131 ymax: 42.64795
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
##                         geometry          ID
## 1 MULTIPOLYGON (((74.89131 37... Afghanistan
## 2 MULTIPOLYGON (((20.06396 42...     Albania

Alternatively, use st_crs():

st_crs(worldmap_data)
## Coordinate Reference System:
##   EPSG: 4326 
##   proj4string: "+proj=longlat +datum=WGS84 +no_defs"

Setting a CRS

Often, the CRS is not set in the spatial dataset, so you have to do it manually:

bln_plan <- read_sf('data/Planungsraum_EPSG_25833.shp')
head(bln_plan, n=2)
## Simple feature collection with 2 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 386668.2 ymin: 5818275 xmax: 389894.1 ymax: 5820432
## epsg (SRID):    NA
## proj4string:    NA
## # A tibble: 2 x 3
##   SCHLUESSEL PLR_NAME                                              geometry
##   <chr>      <chr>                                           <MULTIPOLYGON>
## 1 01011101   "St\xfclerstr." (((387256.6 5818552, 387323.1 5818572, 387418…
## 2 01011102   "Gro\xdfer Tie… (((386767.5 5819393, 386768.3 5819389, 386769…

The documentation (and the file name) say it's EPSG 25833:

st_crs(bln_plan) <- 25833

Setting a CRS

Note that by default, ggplot displays lines ("graticules") and axes according to WGS84 coordinates, although the actual data uses a different CRS:

ggplot() + geom_sf(data = bln_plan)

Setting a CRS

Setting the datum to the same CRS in coord_sf shows axis with the same units as in the data:

ggplot() + geom_sf(data = bln_plan) + coord_sf(datum = 25833)

Transforming the CRS

Suppose we wanted to make some markers on the Berlin map. We obtained longitude/latitude coordinates for them, e.g. for the WZB:

wzb_coord <- data.frame(lng = 13.365097, lat = 52.506459)
ggplot() + geom_sf(data = bln_plan) +
    geom_point(aes(x = lng, y = lat), data = wzb_coord, color = 'red')

Transforming the CRS

Oops! What happened here?

The spatial data for Berlin has a different CRS, it doesn't use WGS84 longitude/latitude coordinates. We have to convert the WZB coordinates to the same CRS (EPSG 25833) via st_transform():

wzb_wgs84 <- st_sfc(st_point(c(wzb_coord$lng, wzb_coord$lat)), crs = 4326)
wzb_etrs89 <- as.data.frame(st_coordinates(st_transform(wzb_wgs84, 25833)))
ggplot() + geom_sf(data = bln_plan) +
    geom_point(aes(x = X, y = Y), data = wzb_etrs89, color = 'red')

Setting the display window

You may only want to show a portion of your spatial data, i.e. "zoom in" to a certain area on the map.

We use spatial data for all countries from the rnaturalearth package:

worldmap <- ne_countries(type = 'map_units', returnclass = 'sf')
str(worldmap)
## Classes 'sf' and 'data.frame':   183 obs. of  95 variables:
##  $ featurecla: chr  "Admin-0 map unit" "Admin-0 map unit" "Admin-0 map unit" "Admin-0 map unit" ...
##  $ scalerank : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ labelrank : int  6 3 7 2 2 3 3 2 2 2 ...
##  $ sovereignt: chr  "Fiji" "United Republic of Tanzania" "Western Sahara" "Canada" ...
##  $ sov_a3    : chr  "FJI" "TZA" "SAH" "CAN" ...
##  $ adm0_dif  : int  0 0 0 0 1 0 0 0 0 0 ...
##  $ level     : int  2 2 2 2 2 2 2 3 2 2 ...
##  $ type      : chr  "Sovereign country" "Sovereign country" "Indeterminate" "Sovereign country" ...
##  $ admin     : chr  "Fiji" "United Republic of Tanzania" "Western Sahara" "Canada" ...
##  $ adm0_a3   : chr  "FJI" "TZA" "SAH" "CAN" ...
##  $ geou_dif  : int  0 0 0 0 0 0 0 1 0 0 ...
##  $ geounit   : chr  "Fiji" "Tanzania" "Western Sahara" "Canada" ...
##  $ gu_a3     : chr  "FJI" "TZA" "SAH" "CAN" ...
##  $ su_dif    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ subunit   : chr  "Fiji" "Tanzania" "Western Sahara" "Canada" ...
##  $ su_a3     : chr  "FJI" "TZA" "SAH" "CAN" ...
##  $ brk_diff  : int  0 0 1 0 0 0 0 0 0 0 ...
##  $ name      : chr  "Fiji" "Tanzania" "W. Sahara" "Canada" ...
##  $ name_long : chr  "Fiji" "Tanzania" "Western Sahara" "Canada" ...
##  $ brk_a3    : chr  "FJI" "TZA" "B28" "CAN" ...
##  $ brk_name  : chr  "Fiji" "Tanzania" "W. Sahara" "Canada" ...
##  $ brk_group : chr  NA NA NA NA ...
##  $ abbrev    : chr  "Fiji" "Tanz." "W. Sah." "Can." ...
##  $ postal    : chr  "FJ" "TZ" "WS" "CA" ...
##  $ formal_en : chr  "Republic of Fiji" "United Republic of Tanzania" "Sahrawi Arab Democratic Republic" "Canada" ...
##  $ formal_fr : chr  NA NA NA NA ...
##  $ name_ciawf: chr  "Fiji" "Tanzania" "Western Sahara" "Canada" ...
##  $ note_adm0 : chr  NA NA "Self admin." NA ...
##  $ note_brk  : chr  NA NA "Self admin.; Claimed by Morocco" NA ...
##  $ name_sort : chr  "Fiji" "Tanzania" "Western Sahara" "Canada" ...
##  $ name_alt  : chr  NA NA NA NA ...
##  $ mapcolor7 : int  5 3 4 6 4 6 2 4 6 3 ...
##  $ mapcolor8 : int  1 6 7 6 5 1 3 2 6 1 ...
##  $ mapcolor9 : int  2 2 4 2 1 6 5 3 6 3 ...
##  $ mapcolor13: int  2 2 4 2 1 1 4 1 11 13 ...
##  $ pop_est   : num  9.21e+05 5.40e+07 6.03e+05 3.56e+07 3.27e+08 ...
##  $ pop_rank  : int  11 16 11 15 17 14 15 13 17 15 ...
##  $ gdp_md_est: num  8374 150600 906 1674000 18560000 ...
##  $ pop_year  : int  2017 2017 2017 2017 2017 2017 2017 2017 2017 2017 ...
##  $ lastcensus: int  2007 2002 NA 2011 2010 2009 1989 NA 2010 2010 ...
##  $ gdp_year  : int  2016 2016 2007 2016 2016 2016 2016 2016 2016 2016 ...
##  $ economy   : chr  "6. Developing region" "7. Least developed region" "7. Least developed region" "1. Developed region: G7" ...
##  $ income_grp: chr  "4. Lower middle income" "5. Low income" "5. Low income" "1. High income: OECD" ...
##  $ wikipedia : int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ fips_10_  : chr  "FJ" "TZ" "WI" "CA" ...
##  $ iso_a2    : chr  "FJ" "TZ" "EH" "CA" ...
##  $ iso_a3    : chr  "FJI" "TZA" "ESH" "CAN" ...
##  $ iso_a3_eh : chr  "FJI" "TZA" "ESH" "CAN" ...
##  $ iso_n3    : chr  "242" "834" "732" "124" ...
##  $ un_a3     : chr  "242" "834" "732" "124" ...
##  $ wb_a2     : chr  "FJ" "TZ" NA "CA" ...
##  $ wb_a3     : chr  "FJI" "TZA" NA "CAN" ...
##  $ woe_id    : int  23424813 23424973 23424990 23424775 23424977 -90 23424980 NA 23424846 23424747 ...
##  $ woe_id_eh : int  23424813 23424973 23424990 23424775 23424977 23424871 23424980 23424926 23424846 23424747 ...
##  $ woe_note  : chr  "Exact WOE match as country" "Exact WOE match as country" "Exact WOE match as country" "Exact WOE match as country" ...
##  $ adm0_a3_is: chr  "FJI" "TZA" "MAR" "CAN" ...
##  $ adm0_a3_us: chr  "FJI" "TZA" "SAH" "CAN" ...
##  $ adm0_a3_un: int  NA NA NA NA NA NA NA NA NA NA ...
##  $ adm0_a3_wb: int  NA NA NA NA NA NA NA NA NA NA ...
##  $ continent : chr  "Oceania" "Africa" "Africa" "North America" ...
##  $ region_un : chr  "Oceania" "Africa" "Africa" "Americas" ...
##  $ subregion : chr  "Melanesia" "Eastern Africa" "Northern Africa" "Northern America" ...
##  $ region_wb : chr  "East Asia & Pacific" "Sub-Saharan Africa" "Middle East & North Africa" "North America" ...
##  $ name_len  : int  4 8 9 6 24 10 10 16 9 9 ...
##  $ long_len  : int  4 8 14 6 13 10 10 16 9 9 ...
##  $ abbrev_len: int  4 5 7 4 6 4 4 6 5 4 ...
##  $ tiny      : int  NA NA NA NA NA NA 5 NA NA NA ...
##  $ homepart  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ min_zoom  : num  0 0 4.7 0 0 0 0 0 0 0 ...
##  $ min_label : num  3 3 6 1.7 1.7 3 3 3 1.7 2 ...
##  $ max_label : num  8 8 11 5.7 5.7 7 8 7.5 6.7 7 ...
##  $ ne_id     : chr  "1159320625" "1159321337" "1159321223" "1159320467" ...
##  $ wikidataid: chr  "Q712" "Q924" "Q6250" "Q16" ...
##  $ name_ar   : chr  NA NA NA NA ...
##  $ name_bn   : chr  NA NA NA NA ...
##  $ name_de   : chr  "Fidschi" "Tansania" "Westsahara" "Kanada" ...
##  $ name_en   : chr  "Fiji" "Tanzania" "Western Sahara" "Canada" ...
##  $ name_es   : chr  "Fiyi" "Tanzania" "Sahara Occidental" "Canadá" ...
##  $ name_fr   : chr  "Fidji" "Tanzanie" "Sahara occidental" "Canada" ...
##  $ name_el   : chr  NA NA NA NA ...
##  $ name_hi   : chr  NA NA NA NA ...
##  $ name_hu   : chr  "Fidzsi-szigetek" "Tanzánia" "Nyugat-Szahara" "Kanada" ...
##  $ name_id   : chr  "Fiji" "Tanzania" "Sahara Barat" "Kanada" ...
##  $ name_it   : chr  "Figi" "Tanzania" "Sahara Occidentale" "Canada" ...
##  $ name_ja   : chr  NA NA NA NA ...
##  $ name_ko   : chr  NA NA NA NA ...
##  $ name_nl   : chr  "Fiji" "Tanzania" "Westelijke Sahara" "Canada" ...
##  $ name_pl   : chr  "Fidzi" "Tanzania" "Sahara Zachodnia" "Kanada" ...
##  $ name_pt   : chr  "Fiji" "Tanzânia" "Saara Ocidental" "Canadá" ...
##  $ name_ru   : chr  NA NA NA NA ...
##  $ name_sv   : chr  "Fiji" "Tanzania" "Västsahara" "Kanada" ...
##  $ name_tr   : chr  "Fiji" "Tanzanya" "Bati Sahra" "Kanada" ...
##  $ name_vi   : chr  "Fiji" "Tanzania" "Tây Sahara" "Canada" ...
##  $ name_zh   : chr  NA NA NA NA ...
##  $ geometry  :sfc_MULTIPOLYGON of length 183; first list element: List of 3
##   ..$ :List of 1
##   .. ..$ : num [1:8, 1:2] 180 180 179 179 179 ...
##   ..$ :List of 1
##   .. ..$ : num [1:9, 1:2] 178 178 179 179 178 ...
##   ..$ :List of 1
##   .. ..$ : num [1:5, 1:2] -180 -180 -180 -180 -180 ...
##   ..- attr(*, "class")= chr  "XY" "MULTIPOLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
##   ..- attr(*, "names")= chr  "featurecla" "scalerank" "labelrank" "sovereignt" ...

Using a subset of the data

One way to "zoom in" is to form a subset of your data, e.g:

europe <- worldmap[worldmap$continent == 'Europe',]
ggplot() + geom_sf(data = europe)

However, this has some disadvantages:

  • you probably wouldn't want to show whole Russia
  • nearby countries are not shown (e.g. Turkey, Northern Africa)

Limiting the display window

We can also use the whole dataset, but set limits on the display window via coord_sf() using the current CRS unit (here: degrees longitude/latitude):

  • xlim are the bounds for the x-axis
  • ylim are the bounds for the y-axis

Using the default Mercator projection (not appropriate for this region):

ggplot() + geom_sf(data = worldmap) +
    coord_sf(xlim = c(-20, 45), ylim = c(30, 73))

Cutting the shapes

We can also "cut" the shapes to fit into the specified display window using st_crop:

To better see the effect, I'm showing the result in ETRS89 projection after the shapes have been cut:

europe <- st_crop(worldmap, xmin = -20, xmax = 45, ymin = 30, ymax = 73)
ggplot() + geom_sf(data = europe) + coord_sf(crs = 25833)

Finding centroids of regions

Labelling regions

Labelling regions without occlusion

  • labels:
  • st_centroid
  • col_bind

Exercise 6

Further reading